Loading the data and data splitting

Explainable AI - Categories

  1. Perturbation-based : Check what happens to the classifier or the regressor when some chnages happen to the input, e.g. masking the input image and see which features are more important.
  1. Function-based: Find the functional perspective of the model and try to intrepret the resuts by the function. Approximate the function by e.g. Taylor decomposition (expansion)
  1. Sampling-based: We approximate the prediction locally. (Gradient * Input)
  1. Stuctured-based: They use the structure of the model to explain the prediction. (Layer-wise Relevance Propagation - LRP). Decompose the function by using the structure and explain the easier function and aggregate them later.

Intuition : Every layer in a NN, for instance, is a composition of simpler functions (e.g., Relu)

SHAP Algorithm

SHAP assigns each feature an importance value for a particular prediction.

1. They introduce the perspective of viewing any explanation of a model’s prediction as a model itself, which they term the explanation model, by defining the class of additive feature attribution methods.

For complex models, such as ensemble methods or deep networks, we cannot use the original model as its own best explanation because it is not easy to understand. Instead, we must use a simpler explanation model, which we define as any interpretable approximation of the original model . A surprising attribute of the class of additive feature attribution methods is the presence of a single unique solution in this class with three desirable properties (described below).

2. Thy then show that game theory results guaranteeing a unique solution apply to the entire class of additive feature attribution methods and propose SHAP values as a unified measure of feature importance that various methods approximate.

3. They propose new SHAP value estimation methods and demonstrate that they are better aligned with human intuition as measured by user studies and more effectually discriminate among model output classes than several existing methods.